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Abstract 

Although operator-valued kernels have re- 
cently received increasing interest in vari- 
ous machine learning and functional data 
analysis problems such as multi-task learn- 
ing or functional regression, little attention 
has been paid to the understanding of their 
associated feature spaces. In this paper, we 
explore the potential of adopting an operator- 
valued kernel feature space perspective for 
the analysis of functional data. We then ex- 
tend the Regularized Least Squares Classifi- 
cation (RLSC) algorithm to cover situations 
where there are multiple functions per obser- 
vation. Experiments on a sound recognition 
problem show that the proposed method out- 
performs the classical RLSC algorithm. 

1. Introduction 

Following the development of multi-task and com- 
plex output learning methods (Pontil & Shawe- Taylor, 
2006), operator- valued kernels have recently attracted 
considerable attention in the machine learning commu- 
nity (Micchelli & Pontil, 2005b; Reisert & Burkhardt, 
2007; Caponnetto et al., 2008). It turns out that these 
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kernels lead to a new class of algorithms well suited 
for learning multi-output functions. For example, 
in multi-task learning contexts, standard single-task 
kernel learning methods such as support vector ma- 
chines (SVM) and regularization networks (RN) are 
extended to deal with several dependent tasks at once 
by learning a vector-valued function using multi-task 
kernels (Evgeniou et al., 2005). Also, in functional data 
analysis (FDA) where observed continuous data are 
measured over a densely sampled grid and then repre- 
sented by real-valued functions rather than by discrete 
finite dimensional vectors, function-valued reproduc- 
ing kernel Hilbert spaces (RKHS) are constructed from 
nonnegative operator- valued kernels to extend kernel 
ridge regression from finite dimensions to the infinite 
dimensional case (Lian, 2007; Kadri et al., 2010). 

While most recent work has focused on study- 
ing operator-valued kernels and their corresponding 
RKHS from the perspective of extending Aronszajn's 
pioneering work (1950) to the vector or function- 
valued case (Micchelli & Pontil, 2005a; Carmeli et al., 
2010; Kadri et al., 2010), in this paper we pay special 
attention to the feature space point of view (Scholkopf 
et al., 1999). More precisely, we provide some ideas 
targeted at advancing the understanding of feature 
spaces associated with operator-valued kernels and we 
show how these kernels can design more suitable fea- 
ture maps than those associated with scalar-valued 
kernels, especially when input data are complex, infi- 
nite dimensional objects like curves and distributions. 
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In many experiments, the observations consist of a 
sample of random functions or curves. So, we adopt in 
this paper a functional data analysis point of view in 
which each curve corresponds to one observation. This 
is an extension of multivariate data analysis where ob- 
servations consist of vectors of finite dimension. For 
an introduction to the field of FDA, the two mono- 
graphs by Ramsay & Silverman (2005; 2002) provide 
a rewarding and accessible overview on foundations 
and applications, as well as a collection of motivating 
examples (Miiller, 2005). 

To explore the potential of adopting an operator- 
valued kernel feature space approach, we are inter- 
ested in the problem of functional classification in the 
case where there are multiple functions per observa- 
tion. This study is valuable from a variety of perspec- 
tives. Our motivating example is the practical problem 
of sound recognition which is of great importance in 
surveillance and security applications (Dufaux ct al., 
2000; Istratc ct al., 2006; Rabaoui et al., 2008). This 
problem can be tackled by classifying incoming sig- 
nals representing the environmental sounds into vari- 
ous predefined classes. In this setting, a preprocessing 
step consists in applying signal-processing techniques 
to generate a set of features characterizing the signal 
to be classified. These features form a so-called feature 
vector which contains discrete values of different func- 
tional parameters providing information about tem- 
poral, frequential, cepstral and energy characteristics 
of the signal. In standard machine learning methods, 
the feature vector is considered to be a subset of W 1 
by concatenating samples of the different functional 
features, and this has the drawback of not consid- 
ering any dependencies between different values over 
subsequent time-points within the same functional da- 
tum. Employing these methods implies that permut- 
ing time points arbitrarily, which is equivalent to ex- 
changing the order of the indexes in a multivariate 
vector, should not change the result of statistical anal- 
ysis. Taking into account the inherent sequential na- 
ture of the data and using the dependencies along the 
time-axis should lead to higher quality results (Lee, 
2004). In our work, we use a functional data anal- 
ysis approach based on modeling each sound signal 
by a vector of functions (in (L 2 ) p for example, where 
L 2 is the space of square integrable functions and p 
is the number of functional parameters) rather than 
by a vector in R n , with the hope of improving per- 
formance by: (1) considering the relationship between 
samples of a function variable and thus the dynamic 
behavior of the functional data, (2) capturing discrim- 
inative characteristics between functions contrary to 
the concatenation procedure. 



During the last decade, various kernel-based meth- 
ods have become very popular for solving classifica- 
tion problems. Among them, the regularized least 
squares classification (RLSC) is a simple regularization 
algorithm which achieves good performance, nearly 
equivalent to that of the well-known Support Vector 
Machines (SVMs) (Rifkin et al., 2003; 2007; Rifkin 
& Klautau, 2004; Zhang & Peng, 2004). By using 
operator-valued kernels, we extend the RLSC algo- 
rithm to cover situations where there are multiple 
functions per observation. One main obstacle for this 
extension involves the inversion of the block operator 
kernel matrix (kernel matrix where the block entries 
are linear operators). In contrast to the situation in 
the multivariate case, this inversion is not always fea- 
sible in infinite dimensional Hilbert spaces. In this 
paper, we attempt to overcome this problem by char- 
acterizing a class of operator-valued kernels and per- 
forming an eigenvalue decomposition of this kernel ma- 
trix. 

The remainder of this paper is organized as follows. 
In section 2, we review concepts of operator- valued 
kernels and their corresponding reproducing kernel 
Hilbert spaces and discuss some ideas for understand- 
ing the associated feature maps. Using these kernels, 
we propose a functional regularized least squares clas- 
sification algorithm in section 3. It is an extension of 
the classical RLSC to the case where there are multiple 
functions per observation. The proposed algorithm is 
experimentally evaluated on a sound recognition task 
in section 4. Finally, section 5 presents some conclu- 
sions and future work directions. 

2. Operator- valued kernels and 
associated feature spaces 

In the machine learning literature, operator-valued 
kernels were first introduced by Micchelli and Pon- 
til (2005b) to deal with the problem of multi-task 
learning. In this context, these kernels, called multi- 
task kernels, are matrix-valued functions and their cor- 
responding vector-valued reproducing kernel Hilbert 
spaces are used to learn multiple tasks simultane- 
ously (Evgcniou ct al., 2005). Most often, multi- 
task kernels are constructed from scalar- valued kernels 
which are carried over to the vector- valued setting by a 
positive definite matrix. More details and some exam- 
ples of multi-task kernels can be found in (Caponnetto 
et al., 2008). In addition, some works have studied 
the construction of such kernels from a functional data 
analysis point of view (continuous data). For example, 
in (Kadri et al., 2010), the authors showed how infi- 
nite dimensional operator-valued kernels can be used 
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to perform nonlinear functional regression in the case 
where covariates as well as responses are functions. 

Since we arc interested in the problem of functional 
classification, we use a similar framework to that 
of (Kadri et al., 2010), but we consider the more gen- 
eral case where we have multiple functions as inputs to 
the learning module. Kernel-based learning method- 
ology can be extended directly from the vector-valued 
to the functional-valued case. The principle of this ex- 
tension is to replace vectors by functions and matrices 
by linear operators; scalar products in vector space are 
replaced by scalar products in function space, which is 
usually chosen as the space of square integrable func- 
tions L? on a suitable domain. In the present paper, 
we focus on infinite dimensional operator-valued ker- 
nels and their corresponding functional- valued RKHS. 

An operator- valued kernel K : X x X — > £Q0 is the 
reproducing kernel of a Hilbert space of functions from 
an input space X which takes values in a Hilbert space 
Y. C(Y) is the set of all bounded linear operators from 
Y into itself. Input data are represented by a vector 
of functions, so we consider the case where X C (L 2 ) p 
and Y C L 2 . Function- valued RKHS theory is based 
on the one-to-one correspondence between reproduc- 
ing kernel Hilbert spaces of function-valued functions 
and positive operator-valued kernels. We start by re- 
calling some basic properties of such spaces. We say 
that a Hilbert space T of functions / : X — > Y has 
the reproducing property, if \/x E X the linear func- 
tional / — > (f(x),y) Y is continuous for any x G X 
and y G Y . By the Riesz representation theorem it 
follows that for a given x G X and for any choice of 
y G Y, there exists an element h y x G J 7 , s.t. 

V/G.F (h%,f)j: = {f(x),v)Y 

We can therefore define the corresponding operator- 
valued kernel K : X x X — > £(Y) such that 

(K(x 1 ,x 2 )y 1 ,y 2 ) Y = (hy\,hll)r 

It follows that 

{K\{x 2 ),y 2 ) Y = (hl\,hll)r = {K{x 1 ,x 2 )y 1 ,y 2 )g v 

and thus we obtain the reproducing property 

(K(x,.)y,f)r = {f(x),y) Y (1) 

Consequently, we obtain that K(.,.) is a positive def- 
inite operator-valued kernel as defined below: (see 
proposition 1 in (Micchelli & Pontil, 2005b) for the 
proof) 

Definition: We say that K(xi,x 2 ), satisfying 
K(xi 1 x 2 ) = K{x 2 ,x\)* (the superscript * indicates 



the adjoint operator), is a positive definite operator- 
valued kernel if given an arbitrary finite set of points 
yi)}i=i....,n G XxY, the corresponding block ma- 
trix K with Kij = (K(xi,Xj)yi,yj)Y is positive semi- 
definite. 

Importantly, the converse is also true. Any posi- 
tive operator- valued kernel K(x\,x 2 ) gives rise to an 
RKHS J-k, which can be constructed by consider- 
ing the space of function-valued functions / having 
the form /(.) = 2i=i K{, x i, -)yi and taking com- 
pletion with respect to the inner product given by 
(K{x 1% .)y 1 ,K(x 2 , .)y 2 )jr = (K(x lt x 2 )y u y 2 ) Y . 

In the following, we present an example of function- 
valued RKHS with functional inputs and the associ- 
ated operator-valued kernel. 

Example. Let X = H and Y = L 2 (fl), where H is 
the Hilbert space of constants in [0, 1] and L 2 (f2) the 
space of square integrable functions on SI. We denote 
by M. the space of L 2 (fi)-valued functions on H whose 

norm \\g\\ 2 M = [g(v)(x)] 2 dvdx is finite. 

Jn Jh 

Let (J 7 ; (., .)jf) be the space of functions from H to 
L 2 (fl) such that: 

T = {/, 3f' = ^P-e M, f(u) = f f(v)dv} 
dv J Q 

(fi> /2)^ = {f'n !' 2 )m 

J 7 is a RKHS with kernel K (u, v) — M v r UtV ^ . M v is the 
multiplication operator associated with the function tp 
where (p(u,v) is equal to u if u(x) < v(x) \fx G fl and 
v otherwise. It is easy to check that K is Hermitian 
and nonnegative. Now we show that the reproducing 
property holds for any / G J-, w G L 2 (fl) and u G H 

(f,K(u,.)w)r = (f,[K(u,.)w}') M 
= / / [f'(v)](x)[K(u,v)w]'(x)dvdx 

= 11 [f'(v)](x)w{x)dvdx= [ [f(u)](x)w(x)dx 
Jn Jo Jn 

= (/(«)> w) L 2 {n) m 

Similar to the scalar case, operator-valued kernels pro- 
vide an elegant way of dealing with nonlinear algo- 
rithms by reducing them to linear ones in some feature 
space F nonlinear ly related to input space. A feature 
map associated with an operator-valued kernel if is a 
continuous function 

$ : X x Y — > C(X,Y) 

such that, for every Xi, x 2 € X and ^,^67 

(K(x 1 ,x 2 )y 1 ,y 2 ) Y = (®{xi,yi),$(x 2 ,y 2 ))c(x,Y) 
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where C(X, Y) is the set of mappings from X to Y . 
By virtue of this property, $ is called a feature map 
associated with K. Furthermore, from (1), it follows 
that in particular 

(K(x!, .)y 1 ,K(x 2 , )y%)^ = (K(xi,x 2 )yi,y2)Y 

which means that any operator-valued kernel admits a 
feature map representation with a feature space J- C 
C(X, Y), and corresponds to a dot product in another 
space. 

From this feature map perspective, we study the ge- 
ometry of a feature space associated with an operator- 
valued kernel and we compare it with the one obtained 
by a scalar-valued kernel. More precisely, we consider 
two reproducing kernel Hilbert spaces (RKHS) T and 
W. J 7 is a RKHS of function- valued functions on X 
with values in Y. X c {L 2 f , Y C L 2 and let K be 
the reproducing operator- valued kernel of J- ' . T-L is also 
a RKHS, but of scalar- valued functions on X with val- 
ues in K, and k its reproducing real- valued kernel. The 
mappings and <£>& associated, respectively, with the 
kernels K and k are defined as follows 

<S> y K :(L 2 y^£((L 2 r,L 2 ), x^K(x,.)y 

and 

<S> k : (LY->/;((L 2 )p,IR), x^k(x,.) 

These feature maps can be seen as a mapping of the 
input data Xi, which are vector of functions in {L 2 ) p , 
into a feature space in which the dot product can be 
computed using the kernel functions. This idea leads 
to design nonlinear methods based on linear ones in 
the feature space. In a supervised classification prob- 
lem for example, since kernels could map input data 
into a higher dimensional space, kernel methods deal 
with this problem by finding a linear separation in the 
feature space between data which can not be separated 
linearly in the input space. We now compare the di- 
mension of feature spaces obtained by the maps 
and $fc. To do this, we adopt a functional data anal- 
ysis point of view where observations are composed of 
sets of functions. Direct understanding of this FDA 
viewpoint comes from the consideration of the "atom" 
of a statistical analysis. In a basic course in statis- 
tics, atoms are "numbers" , while in multivariate data 
analysis the atoms are vectors and methods for under- 
standing populations of vectors are the focus. FDA 
can be viewed as the generalization of this, where the 
atoms are more complicated objects, such as curves, 
images or shapes represented by functions (Zhao et al., 
2004). Based on this, the dimension of the input space 
is p since Xi £ {L 2 ) p is a vector of p functions. The 



feature space obtained by the map is a space of 
functions, so its dimension from a FDA point of view 
is one. The map $k projects the input data into a 
space of operators C(X,Y). This means that using 
the operator-valued kernel K corresponds to mapping 
the functional data x< into a higher, possibly infinite, 
dimensional space (L 2 ) d with d — > oo. In a binary 
functional classification problem, we have higher prob- 
ability to achieve linear separation between the classes 
by projecting the functional data into a higher dimen- 
sional feature space rather than into a lower one, that 
is why we think that it is more suitable to use operator- 
valued than scalar- valued kernels in this context. 

3. Functional regularized least squares 
classification 

In this section, we show how to extend the regular- 
ized least squares classification algorithm (RLSC) to 
functional contexts using operator-valued kernels. To 
use these kernels for a classification problem, we con- 
sider the labels to be functions in some function space 
rather than real values as usual. The functional classi- 
fication problem can then be framed as that of learn- 
ing a function- valued function / : X — > Y where X C 
{L 2 Y and Y C L 2 . The RLSC algorithm (Rifkin et al, 
2003) is based on solving a Tikhonov minimization 
problem associated with a square loss function, and 
then an estimate /* of / in a Hilbert space T with re- 
producing operator- valued kernel K : X x X — > £(Y) 
is obtained by minimizing 

n 

f* =argmin^||y i -/(x i )||^ + A||/||^ (2) 

1=1 

By the representer theorem (Micchelli & Pontil, 2005a; 
Kadri et al., 2010), the solution of this problem has the 
following form 

n 

r(x) = Y / K{x,x j )0 j , P 3 £Y (3) 

Substituting (3) in (2) , we come up with the following 
minimization over the scalar- valued functions j3i rather 
than the function-valued function / 

n n 

a m /n E HA - E K i x U x j)0j\\Y 

/3„eOO n i=i ^ j=i ^ 

+\f:(K(x i ,x j )P h p j } Y 

(3 V is the vector of functions (A)i=i,...,n S (L 2 )"- The 
problem (4) can be solved in three ways. Assum- 
ing that the observations are made on a regular grid 
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{ti, . . . , i m }, one can first discretize the functions Xj 
and yi and then solve the problem using multivari- 
ate data analysis techniques. However, this has the 
drawback, as well known in the FDA literature, of 
not considering the relationships that exist between 
samples. The second way consists in considering the 
output space Y to be a scalar valued reproducing 
Hilbcrt space. In this case, the functions pi can be 
approximated by a linear combination of a scalar ker- 
nel Pi — Y^iLi a uk{si, ■) and then the problem (4) be- 
comes a minimization problem over the real values an . 
Another possible way to solve the minimization (4) is 
to compute its derivative using the directional deriva- 
tive and setting the result to zero to find an analytic 
solution of the problem. It follows that /?„ satisfies the 
system of linear operator equations 



Algorithm 1 Functional RLSC 



QC + \I)f3 v = y v 



(5) 



where fC = [K (a;,, is a n x n block operator 

matrix (/Cy £ C(Y)) and y v the vector of functions 

In this work, we are interested in this third approach 
which extends the classical RLSC algorithm to func- 
tional data analysis domain. One main obstacle for 
this extension is the inversion of the block operator 
kernel matrix JC. Block operator matrices generalize 
block matrices to the case where the block entries are 
linear operators between infinite dimensional Hilbert 
spaces. In contrast to the situation in the multivari- 
ate case, inverting such matrices is not always feasi- 
ble in infinite dimensional spaces. To overcome this 
problem, we study the eigenvalue decomposition of a 
class of block operator kernel matrices obtained from 
operator-valued kernels having the following form 

K{xi,Xj) — G(xi,Xj)T, Vxi,Xj€X (6) 

where G is a scalar-valued kernel and T is an op- 
erator in C(Y). This kernel construction is adapted 
from (Micchelli & Pontil, 2005a;b). Choosing T de- 
pends on the context. For multi-task kernels, T is a 
finite dimensional matrix which model relations be- 
tween tasks. In FDA, Lian (2007) suggested the use of 
the identity operator, while Kadri et al. (2010) showed 
that it will be more useful to choose other operators 
than identity that are able to take into account func- 
tional properties of the input and output spaces. They 
introduced a functional extension of the Gaussian ker- 
nel based on the multiplication operator. In this work, 
we are interested in kernels constructed from the in- 
tegral operator. This seems to be a reasonable choice 
since functional linear model (see Eq. (7)) are based 
on this operator (Ramsay & Silverman, 2005) 



y(s) = a(s) + / x{t)v(s : t)dt 



Input 

data n £ (L 2 ([0, l])) p , size n 
labels yi £ L 2 ([0, 1]), size n 
Eigendecomposition of Q 

g = G(i„i 3 )^ 1 €M" x » 

eigenvalues a s 6 R, size n 
eigenvectors Vi £ R n , size n 
Eigendecomposition of T 

T £ C{Y) 

Initialize k: number of eigenfunctions 
eigenvalues £ R, size k 
eigenfunctions Wi £ L 2 ([0, 1]), size k 
Eigendecomposition of JC = Q ® T 

K = K{x i ,x j )l j=1 £{C{Y)) n * n 
eigenvalues 9i £ R, size n x k 
6 = a®8 

eigenfunctions Zi £ (L 2 ([0, 1]))™, size n x k 
z = v (£> w 
Solution of (4) /3 = QC + A/)" 1 */ 

Initialize A: regularization parameter 



where a and v are the functional parameters of the 
model. So we consider the following positive definite 
operator-valued kernel 

{K{x uXj )y){t) = GfaXj) f e-Wy(a)da (8) 



where y £ Y and {s, t} £ 57 = [0, 1]. Note that a simi- 
lar kernel was proposed in (Caponnetto et al., 2008) for 
linear spaces of functions from R to Y. The n x n block 
operator kernel matrix K, of operator-kernels having 
the form (6) can be expressed by a Kronecker product 



between the matrix Q 
the operator T £ C(Y) 



3/i,j = l 



and 



K 



/G( Xl , Xl )T 



\G(x n ,xi)T 



G{x 1 ,x n )T^ 



G(a 



(J) 



In this case, the eigendecomposition of the matrix K, 
can be obtained from the eigendecompositions of Q and 
T (see Algorithm 1). Let 9i and Zi be, respectively, the 
eigenvalues and the eigenfunctions of IC, the inverse 
operator K~ l is given by 

i 

Now we are able to solve the system of linear operator 
equation (5) and the functions Pi can be computed 
from eigenvalues and eigenfunctions of the matrix JC, 
as described in Algorithm 1. 
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4. Experiments 

Our experiments are based on a sound recognition 
task. The performance of the functional RLSC algo- 
rithm, described in section 3, is evaluated on a data set 
of sounds collected from commercial databases which 
include sounds ranging from screams to explosions, 
such as gun shots or glass breaking, and compared 
with the RLSC method (Rifkin ct al., 2003). 

Many previous works in the context of sound recog- 
nition have concentrated on classifying environmental 
sounds other than speech and music (Dufaux et al., 
2000; Pcltoncn ct al., 2002). Such sounds are ex- 
tremely versatile, including signals generated in do- 
mestic, business, and outdoor environments. A sys- 
tem that is able to recognize such sounds may be of 
great importance for surveillance and security applica- 
tions (Istrate et al., 2006; Rabaoui et al., 2008). The 
classification of a sound is usually performed in two 
steps. First, a pre-processor applies signal processing 
techniques to generate a set of features characterizing 
the signal to be classified. Then, in the feature space, 
a decision rule is implemented to assign a class to a 
pattern. 

4.1. Database description 

As in (Rabaoui et al., 2008), the major part of the 
sound samples used in the recognition experiments is 
taken from two sound libraries (Leonardo Software; 
Real World Computing Paternship, 2000). All signals 
in the database have a 16 bits resolution and are sam- 
pled at 44100 Hz, enabling both good time resolution 
and a wide frequency band, which are both necessary 
to cover harmonic as well as impulsive sounds. The 
selected sound classes are given in Table 1, and they 
are typical of surveillance applications. The number 
of items in each class is deliberately not equal. 

Note that this database includes impulsive sounds and 
harmonic sounds such as phone rings (C6) and chil- 
dren voices (C7). These sounds are quite likely to 
be recorded by a surveillance system. Some classes 
sound very similar to a human listener: in particular, 
explosions (C4) are pretty similar to gunshots (C2). 
Glass breaking sounds include both bottle and win- 
dow breaking situations. Phone rings are either elec- 
tronic or mechanic alarms. Temporal representations 
and spectrograms of some sounds are depicted in Fig- 
ure 1 and 2. Power spectra are extracted through the 
Fast Fourier Transform (FFT) every 10 ms from 25 ms 
frames. They are represented vertically at the corre- 
sponding frame indexes. The frequency range of inter- 
est is between and 22 kHz. A lighter shade indicates 
a higher power value. These figures show that in the 



Table 1. Classes of sounds and number of samples in the 
database used for performance evaluation. 



Classes 


Number 


Train 


Test 


Total 


Duration(s) 


Human screams 


CI 


40 


25 


65 


167 


Gunshots 


C2 


30 


m 


55 


97 


Glass breaking 


C3 


48 


25 


73 


123 


Explosions 


C4 


41 


21 


02 


180 


Door slams 


C5 


50 


25 


75 


96 


Phone rings 


C6 


34 


17 


51 


107 


Children voices 


C7 


58 


29 


87 


140 


Machines 


C8 


40 


20 


60 


184 


Total 




327 


181 


508 


18mn 14s 



considered database we can have both: (1) many sim- 
ilarities between some sounds belonging to different 
classes, (2) diversities within the same sound class. 

4.2. Results 

Following (Rifkin & Klautau, 2004) , the 1-vs-all multi- 
class classifier is selected in these experiments. So we 
train N (number of classes) different binary classifiers, 
each one trained to distinguish the data in a single 
class from the examples in all remaining classes. We 
run the N classifiers to classify a new example. In sec- 
tion 3, we showed that operator-valued kernels can be 
used in a classification problem by considering the la- 
bels yi to be functions in some function space rather 
than real values. Similarly to the scalar case, a natu- 
ral choice for would seem to be the Heaviside step 
function in L 2 ([0,1]) scaled by a real number. The 
used operator valued-kernel is based on the integral 
operator as defined in (8). Eigenvalues 8i and eigen- 
functions Wi associated with this kernel are equal to 
^ 2 and Hi cos(/j,ix) + sm([Xix), respectively ; where 

/i.; are solutions of the equation cot fj, = ^(^k — -M. 

The adopted sound data processing scheme is the fol- 
lowing. Let X be the set of training sounds, shared in 
N classes denoted C±, . . . ,Cn. Each class contains rrii 
sounds, i = 1,...,N. Sound number j in class Cj is 
denoted Si j, (i — l,...,N,j — l,...,mi). The pre- 
processor converts a recorded acoustic signal s^j into a 
time/frequency localized representation. In multivari- 
ate methods, this representation is obtained by split- 
ting the signal Sjj into Tij overlapping short frames 
and computing a vector of features Zt,i,j ,t = l,..., T^j 
which characterize each frame. Since the pre-processor 
is a series of continuous time-localized features, it will 
be useful to take into account the relationships be- 
tween feature samples along the time axis and consider 
dependencies between features. That is why we use a 
FDA-based approach in which features representing a 
sound are modeled by functions Zj 

In this work, Mel Frequency Cepstral Coefficients 
(MFCCs) are used to describe the spectral shape of 
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Figure 1. Structural similarities between two different 
classes. 



Table 3. Confusion Matrix obtained when using the Regu- 
larized Least Squares Classification (RLSC) algorithm 




Figure 2. Structural diversity inside the same sound class 
and between classes. 



each signal. These coefficients are obtained using 23 
channels Mel filterbank and a Hamming analysis win- 
dow of length 25 ms with 50% overlap. We choose 
to use 13 MFCC features and the energy parameter 
measured along the sound signal. So, each sound is 
characterized by 14 functional parameters: 13 cepstral 
functions and one energy function. 

Performance of the Functional RLSC based classifier 
is compared to the results obtained by the RLSC al- 
gorithm, see Table 2 and 3. The performance is mea- 
sured as the percentage number of sounds correctly 
recognized and it is given by (W r /T n ) x 100%, where 
W r is the number of well recognized sounds and T n 
is the total number of sounds to be recognized. The 
use of the Functional RLSC is fully justified by the 





Cl 


C2 


C3 


C4 


C5 


C6 


C7 


C8 




Cl 


92 


4 


4.76 





5.27 


11.3 


6.89 





C2 





52 





14 





2.7 








C3 





20 


76.2 











17.24 


5 


C4 





16 





66 














C5 


4 


8 





4 


84.21 





6.8 





C6 


4 











10.52 


86 








C7 











8 








69.07 





C8 








19.04 


8 











95 



Total Recognition Rate — 77.56% 



results presented here, as it yields consistently lower 
error rates and a high classification accuracy for the 
major part of the sound classes. 

5. Conclusion 

This paper has put forward the idea that by viewing 
operator-valued kernels from a feature map perspec- 
tive, we can design more general kernel methods well- 
suited for complex data. Based on this, we have ex- 
tended the regularized least squares classification algo- 
rithm to functional data analysis contexts where input 
data are real- valued functions rather than finite dimen- 
sional vectors. Through experiments on sound recog- 
nition, we have shown that the proposed approach is 
efficient and improves the classical RLSC method in 
a sound classification dataset. Further investigations 
will consider larger classes of operator-valued kernels. 
It would also be interesting to study learning methods 
for choosing the operator-valued kernel. 
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